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Abstract. We study global optimization (GOP) in the framework of non-linear inverse 
problems with a unique solution. These problems are in general ill-posed. Evaluation 
of the objective function is often expensive, as it implies the solution of a non-trivial 
forward problem. The ill-posedness of these problems calls for regularization while the 
high evaluation cost of the objective function can be addressed with response surface 
techniques. The global optimization using Radial Basis Function (RBF) as presented by 
Gutmann is a response surface global optimization technique with regularizing aspects. 
Alternatively, several publications put forward global optimization using a probabilistic 
approach based upon Kriging as an efficient technique for non-linear multi modal objective 
functions, thereby providing a credible stopping rule [H]- After comparing both concepts, 
we argue that in case of non-linear inverse problems an adaptation of the RBF algorithm 
seems to be the most promising approach. 
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1. Introduction 

The global optimization (GOP) technique using response surfaces developed in [6 and 

addresses problems where the only information available is the ability to evaluate the 

objective function at high cost. The main idea is to replace the objective function by 

a response surface that interpolates the objective function over a finite set of evaluation 

points, for which its value is known. The response surface is assumed to approximate the 

objective function and used as a surrogate for the objective function. A naive idea is to 

search for the global minimum of the response surface, followed by the evaluation of the 

objective function at this minimum, and the incorporation of this new data into the response 

surface, after which the entire procedure is repeated using the updated response surface. 

Unfortunately, the resulting sequence often does not contain a subsequence converging to a 
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global minimum of the objective function. A concentration of evaluation points can cause 
"extrapolation" type of errors in regions where the evaluation points are less dense. Due 
to this phenomenon, the response surface can bridge an unexplored valley of the objective 
function and hide a possible global minimum there. A solution may consist of finding a 
good balance between minimum search and global exploration. Minimum search will result 
in a good approximation of the global minimum if the region of the global minimum is 
correctly represented by the response surface, and if this surface does not introduce minima 
lower than the global minimum. In order to guarantee a trustworthy representation, a more 
or less uniform coverage of the whole domain of interest should be generated. 



Response Surface GOP, the Basic Algorithm: 

• Choose a number of initial evaluation points. 
Evaluate the objective function at the evaluation points. 

Create a surrogate function based upon the evaluation points and their evalua- 
tions, called response surface, that should approximate the function to be minimized. 

• (1) Search for a new evaluation point using the response surface and the 

position of the evaluation points, balancing scanning unexplored areas and 
believe in the response surface as a reliable surrogate. 
(2) Evaluate the objective function at this new evaluation point, and update the 
surrogate function. 

• Repeat 1, 2 until satisfaction or exhaustion. 

This scheme can be used both with an RBF or Kriging approach. The RBF algorithm 
and Kriging differ with regard to the interpolating function chosen and the way in which 
they balance exploration and minimum search. In what follows a concise description of 
standard RBF and the Kriging based GOP will be introduced followed by a comparison 
with non-linear inverse problems in mind. 
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2. Response Surface GOP using Radial Basis Functions 



The Radial Basis Function approach consists in constructing a hnear space from which 
the interpolating functions are chosen dependent on the arguments - the position 

of the data points. One way to introduce this dependency is to use a linear combination of 
functions with a radial symmetry about each of the Xi. 

n 

^\ii^{\\x - Xi\\) 
i=l 

where ||-|| is the Euclidean norm in and is a function [0, oo[-^ M. 

Analytic results exist for the interpolation problem using this form, e.g. the "multi- 
quadric" ip{r) = a/t"^ + 7^, 7 > 0, and the "Gaussian" ip{r) = e"'''^^, 7 > 0, presented by 
Micchelli jH]. Despite these results it is useful to allow the addition of a polynomial of a 
low degree. This extends the theory to include "thin plate splines", and to increases the 
number of possible 95's to be used p. 

Definition 1. Given {xi | z = 1 ■ ■ ■ n}, n pairwise distinct points in M°', an RBF interpo- 
lation function on is a function of the form : 

n 

s : M*^ ^ M : X ^ s{x) = ^Xi(p{\\x - Xi\\) + p{x) 

i=l 

with \\.\\ the Euclidean norm in W^, a function [0,oo[^ M., and p a polynomial of low 
degree with a limitation depending on the type ofip. 

The functions ip considered by Jones are defined by one of the formulas in table 1. 
Depending on the choice for (p the RBF function, s will be indicated with the adjective in 
the column "RBF type" . The degree of the polynomials will be restricted depending on the 
choice of related to the interpolating capabilities of the corresponding RBF, as shown in 
the column "Maximal degree" of table 1, cf. [0]. 

2.1. Balance between exploration and minimum search. Given the previously com- 
puted values of the objective function /(xi), ■ ■ • , at the data points xi, ■ ■ ■ , x„, the 
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RBF type 


Maximal 
degree 


(p{r) — r 


linear 







cubic 


1 


, s jr'^log{r) if r > 
~ 1 if r = 


thin plate spline 


1 


9?(r) = ^/r^ + 7^ 


multiquadric 





(fi{r) — e~'^^^ 


Gaussian 


no p needed 



Table 1. Different choices for ip, and the maximal degree n of p. 



problem is to choose the next evaluation point. The idea is to start from an "estimate" of 
the value for the global minimum, then choose a "reasonable" candidate for the location 
of the global minimum based upon a combination of this estimate, the response surface, 
and the position of the previously computed data points. The main concern is limiting the 
introduction of possible artificial minima. The choice is made such that within the class of 
interpolating functions a measure of "bumpiness" is minimal. 

Let us have n evaluation points of the objective function and its value there, {{xi, f{xi)) | i — 
1 ■ ■ ■ n}. Denote by Rn the response surface that interpolates those points. Assume that /* 
is some value below the calculated global minimum i?min min i?„ of the response surface 
Rn] Rmin is an estimate of the global minimum of the objective function. Obviously it is 
equal or below /min := niinj f{xi). We now consider the response surface Rx that interpo- 
lates the objective function at the points {(xj, f{xi)) \ i = 1 ■ ■ ■ n}, and takes the value /* 
at X. This response surface R^ may be considered as a function of x. The bumpiness of R^ 
obviously depends on the interpolating method and the location of the (fixed) evaluation 
points {xi}. It also depends on x and the hypothetical value /*. If /* is near to /(xj) 
for some Xi, we may choose x near Xi without introducing steep gradients and without a 
large effect on the bumpiness of R^- However, if /* is chosen far from any f{xi), we cannot 
choose X near any if we do not want a large increase in bumpiness. We can use this 
as a guidance in the selection of the next evaluation point. If we constraint the search for 
the minimum by the condition that the increase of bumpiness of Rx is as small as possible. 
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then a choice of /* shghtly below the calculated estimate of the minimum of i?„ is likely to 
result in an x near the location of this minimum. On the other hand, if the target value /* 
is far below, this may result in a point far away, in a yet unexplored part of the domain of 
interest. 

Response Surface GOP using Radial Basis Functions, the Basic Algorithm: 

• Choose a number of initial evaluation points. 
Evaluate the objective function at the evaluation points. 

Create a surrogate function: determine the RBF interpolating the objective 
function at the evaluation points. 

• (1) Search for a new evaluation point: 

— determine the global minimum of the surrogate function i?„ and decide 
upon a target value T lower than its minimum, 

— find a point x* that minimizes the chosen measure of bumpiness of the 
RBF interpolating {(xj, f{xi)) \ i = 1 ■ ■ ■ n} U {{x,T)}. 

(2) Evaluate the objective function at this new evaluation point, and update the 
surrogate function. 

• Repeat 1, 2 until satisfaction or exhaustion. 

3. Response Surface GOP using Kriging 

A wide research field emerged around the usage of probabilistic models, such as Kriging, 
in the field of global optimization. A concise description of the use of Kriging in the context 
of global optimization can be found in 0, whereas a more elaborate treatment is given in 
[Oj. In these papers, the validity is claimed using the statistical terminology and frequency 
interpretations. The statistical basis for spatial data analysis can be found in pP and jS]. 

Kriging starts from the fundamental hypothesis that the objective function to be op- 
timized is the realization of a random field Y. For convenience , it is assumed that this 
random field is square integrable. As a consequence, the modelling of the random field can 
be split into the modelling of a random field with expectation at each point of its domain. 
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and the modelling of the expectation. 



The only information about the objective function resides in the calculated values at the 
observation points. For each random variable Y^^ located at an observation point Xi only 
one observation is available: the calculated value f{xi) of the objective function at this 
point. Meaningful statistics, however, are difficult to extract starting from just one value. 
To overcome this problem additional model hypotheses are needed, linking the random 
variables of the different observation points. One approach is to adopt a form of "sta- 
tionarity" hypothesis. "Stationarity" stems from the continuous-time series environment, 
meaning that the probabilistic structure in some sense does not depend on a shift in time, 
and the covariance between the variables at different moments in time merely depends on 
the time lag. In the analysis of spatial data this comes down to the assumption that the 
probabilistic structure does not depend on the location and the covariance between two 
variables at different places in one direction merely depends on the distance between them. 

It is assumed that the mean and the variance of the variables of the random field are con- 
stant for all locations u in the region of interest D of the GOP, («)] = fi, var(y (m)) = cr^. 
The next step is to introduce a parametrized model for the covariance structure of the ran- 
dom field: 



with u and v points in the acceptance region D. The last step is to introduce a probability 
density hypothesis: 

For all Ml, ■ ■ ■ ,Um in D the multivariate random variable {Y{ui), ■ ■ ■ ,Y{um)) follows a 
multivariate normal distribution. The parameters "(^i, ■ ■ ■ , t^d > 0, pi, ■ ■ ■ > 0, /i and 
cr^ are to be estimated using the partial observation. This estimation can be done using a 
"two step" loglikelihood estimation technique described in |5j and The estimation in 



d 




(3.1) 
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itself is a non-trivial global optimization problem. Starting from the parameters a prediction 
can be made for the unobserved part of the realization of the random field. This results in 
a function s that interpolates the partial realization of the following form: 

n 
i=l 

The Kriging interpolating function s is a linear combination of radial basis functions and a 
constant. 

3.1. Balance between exploration and minimum search. Together with an estimate 
of the objective function, which is a realization of the random field, several auxiliary func- 
tions can be estimated at each point in the acceptance region such as the lower and upper 
confidence interval functions, the probability of improvement, and the expected improve- 
ment. The probability of improvement and expected improvement are suited for find- 
ing a new observation point. The probability of improvement, given a target value T, 
P[Y{x) < T], amounts to the probability that the value at a certain point will be lower 
than the chosen target T. The expected improvement at a point x, E[Y{x) — /min], is the 
expectation of the difference between the random field Y{x), and the constant value /min- 
This is being used in a GOP algorithm, choosing target values T alternating between values 
significantly smaller than /min and values closely approaching /min- Maximizing the proba- 
bility of improvement for a well chosen cycle of T, balances scanning unexplored areas and 
believe in the response surface as an approximation of the objective function. The result 
of maximizing the expected improvement does not require a target value to be set. 

Response Surface GOP using Kriging, the Basic Algorithm: 

• Choose a number of initial evaluation points. 
Evaluate the objective function at the evaluation points. 

Create a surrogate function based upon the evaluation points and their evalua- 
tions: 

— estimate the scaling and smoothness parameters minimizing the maximum like- 
lihood function 
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— calculate the coefficients of the Kriging interpolating function 

• (1) Search for a new evaluation point: 

— determine /min the minimum of the currently known values and decide 
upon a target value T lower than /min, find a point x* that maximizes 
the probability of improvement with respect to T, 

— or find a point x* that maximizes the expected improvement E[Y{x) — 

ymin] • 

(2) Evaluate the objective function at this new evaluation point, and update the 
surrogate function. 

• Repeat 1 till 2 until satisfaction or exhaustion. 

4. Discussion and Comparison of RBF and Kriging GOP in Inverse 

Problems 

Kriging replaces the linear combination of radially symmetric functions about the obser- 
vation points by a linear combination of functions with "directional" symmetry about the 
observation points. When the parameters of the Kriging model are restricted by stating 
that all parameters -i^j as well as Pi are equal, the interpolating estimate becomes a RBF. 
Even with these restrictions, the Kriging algorithm does not become an RBF GOP. Indeed 
the algorithms still differ in the criterion used in the selection of a new observation point. 
The search for the new observation point in the RBF GOP stems from an energy reasoning, 
and the direct need to have an automatic equilibrium between exploration and believe. In 
the Kriging GOP on the other hand this equilibrium is a consequence of criteria developed 
using probabilistic reasoning. The basic assumption underlying the development of this 
probabilistic theory is that the observations arc part of one realization of a random field. 
In a large class of inverse problems, the objective function is deterministic. Although it 
is not hard to introduce a random field with a given objective function as possible out- 
come, it is however, difficult to interpret probabilistic statements in a meaningful way if 
the random field is not theoretically linked to the problem at hand. Even if the objective 
function can be interpreted as the outcome of a random field, a second more fundamental 
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problem related to the probabilistic theory emerges. In the iterative process of finding a 
global optimum, the observation points are chosen so as to obtain a sequence of values for 
the objective function that converges to its global minimum. In this way, the new observa- 
tion point is chosen based upon all previous observations. Nevertheless, in the development 
of the parameter estimators, the Kriging approach treats the observation points and their 
corresponding function values as if the position of one observation point in the sequence 
does not depend on the values of the objective function in the previous observation points. 
If one tries to find a global minimum, it can be expected that most of the evaluations 
are below the "average" of the complete but unknown realization (the objective function). 
In this way the estimate of the mean will be an underestimation. When the sequence of 
function values converges to the global minimum, the estimate of the variance will be an 
underestimation too. 

The Kriging procedure, implies that the basis functions are given an orientational pref- 
erence, which appears only to be a good idea if at every local minimum the eigenvalues 
and eigenvectors of the Hessian matrix of the objective function are similar. When scouting 
about a potential local minimum the behavior of the objective function in the neighborhood 
of this minimum will tend to dominate the estimates. If there is no clear evidence that this 
behavior can be generalized to all points in the acceptance region, certain regions will tend 
to be less favorably approximated than in an approach with no orientational adaptation. 
In geological applications this re-scaling option agrees with the fact that the "depth" 
coordinate defined by the gravitational force is to be treated distinctly from the two other 
coordinates on the surface, taken perpendicular to the depth coordinate. However, this 
does not necessarily apply to other applications. 

What remains is that in the Kriging GOP approach each of the criteria to chose the new 
evaluation point, seeks to find a balance between exploration and believe in the response 
surface as a surrogate. Under certain conditions minimization of the expected improvement 
creates a subset, dense in the acceptance region, and thus will come close to the global 
minimum [Zj. These properties remain also true outside the Kriging setting, independent 
of the statistical validity arguments. There is no need to confine the criteria used in the 
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recruitment of the new evaluation points, exclusively to the Kriging GOP setting. Instead, 
it can be useful to examine their performance also in a RBF GOP setting. 

The RBF GOP algorithm replaces the original GOP problem by a sequence of GOP 
problems with a relatively simple update from one iteration to the next, cf. fl]. The 
Kriging GOP algorithm instead replaces the original GOP problem by a sequence of two 
GOP problems, the first being the non-trivial estimation of the model parameters. 

In inverse problems there is a clear need for regularization. When working with the 
Gaussian RBF's, this can be done by controlling the impact of the radial basis function via 
the parameter 7. Following the Kriging algorithm the adaptation of the parameters to the 
observations might cause the local behavior of the response surface to be modelled on a too 
detailed level, cancelling out the regularization effect necessary to compute a meaningful 
solution to ill posed inverse problems. 

5. Conclusions 

The application of a probabilistic model to find the solution of an inverse problem is not 
evident, both from theoretical and practical point of view. Not only is there a clear cost in 
the adaptation of the basis functions to the observations, also the adaptation itself is not 
necessarily wanted, because it may cancel the underlying regularizing effects, and might 
model the "noise". Instead it could be interesting to use adaptive RBF's, with automatic 
updating based upon the data, incorporating the fact that the sequence of observation points 
is not random. When doing so, safeguards have to be build in to retain the necessary 
regularization properties of the method. The criterion used in the selection of the new 
iteration point in the Kriging algorithm, may be adapted to the RBF GOP algorithm. 
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